arabic word
From A for algebra to T for tariffs: Arabic words used in English speech
Arabic is one of the world's most widely spoken languages with at least 400 million speakers, including 200 million native speakers and 200 million to 250 million non-native speakers. Modern Standard Arabic (MSA) serves as the formal language for government, legal matters and education, and it is widely used in international and religious contexts. Additionally, more than 25 dialects are spoken primarily across the Middle East and North Africa. The date was chosen to mark the day in 1973 on which the UN General Assembly adopted Arabic as one of its six official languages. In the following visual explainer, Al Jazeera lists some of the most common words in today's English language that originated from Arabic or passed through Arabic before reaching English.
- North America > United States (0.51)
- South America (0.41)
- North America > Central America (0.41)
- (10 more...)
- Law (0.36)
- Government (0.35)
Building and Aligning Comparable Corpora
Saad, Motaz, Langlois, David, Smaili, Kamel
Comparable corpus is a set of topic aligned documents in multiple languages, which are not necessarily translations of each other. These documents are useful for multilingual natural language processing when there is no parallel text available in some domains or languages. In addition, comparable documents are informative because they can tell what is being said about a topic in different languages. In this paper, we present a method to build comparable corpora from Wikipedia encyclopedia and EURONEWS website in English, French and Arabic languages. We further experiment a method to automatically align comparable documents using cross-lingual similarity measures. We investigate two cross-lingual similarity measures to align comparable documents. The first measure is based on bilingual dictionary, and the second measure is based on Latent Semantic Indexing (LSI). Experiments on several corpora show that the Cross-Lingual LSI (CL-LSI) measure outperforms the dictionary based measure. Finally, we collect English and Arabic news documents from the British Broadcast Corporation (BBC) and from ALJAZEERA (JSC) news website respectively. Then we use the CL-LSI similarity measure to automatically align comparable documents of BBC and JSC. The evaluation of the alignment shows that CL-LSI is not only able to align cross-lingual documents at the topic level, but also it is able to do this at the event level.
- Europe > France (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Bulgaria (0.04)
- (15 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
ArEEG_Words: Dataset for Envisioned Speech Recognition using EEG for Arabic Words
Darwish, Hazem, Malah, Abdalrahman Al, Jallad, Khloud Al, Ghneim, Nada
Brain-Computer-Interface (BCI) aims to support communication-impaired patients by translating neural signals into speech. A notable research topic in BCI involves Electroencephalography (EEG) signals that measure the electrical activity in the brain. While significant advancements have been made in BCI EEG research, a major limitation still exists: the scarcity of publicly available EEG datasets for non-English languages, such as Arabic. To address this gap, we introduce in this paper ArEEG_Words dataset, a novel EEG dataset recorded from 22 participants with mean age of 22 years (5 female, 17 male) using a 14-channel Emotiv Epoc X device. The participants were asked to be free from any effects on their nervous system, such as coffee, alcohol, cigarettes, and so 8 hours before recording. They were asked to stay calm in a clam room during imagining one of the 16 Arabic Words for 10 seconds. The words include 16 commonly used words such as up, down, left, and right. A total of 352 EEG recordings were collected, then each recording was divided into multiple 250ms signals, resulting in a total of 15,360 EEG signals. To the best of our knowledge, ArEEG_Words data is the first of its kind in Arabic EEG domain. Moreover, it is publicly available for researchers as we hope that will fill the gap in Arabic EEG research.
- Europe > Portugal (0.04)
- Europe > Netherlands (0.04)
- Europe > Belgium > Flanders (0.04)
- Asia > Middle East > Syria (0.04)
- Health & Medicine > Therapeutic Area > Neurology (0.69)
- Leisure & Entertainment > Sports > Golf (0.48)
Crowdsourcing Lexical Diversity
Khalilia, Hadi, Otterbacher, Jahna, Bella, Gabor, Noortyani, Rusma, Darma, Shandy, Giunchiglia, Fausto
Lexical-semantic resources (LSRs), such as online lexicons or wordnets, are fundamental for natural language processing applications. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also, the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual \emph{lexical gaps}, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing tool facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.
- Europe > United Kingdom > UK North Sea (0.05)
- Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.05)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- (31 more...)
Arabic Handwritten Text Line Dataset
Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.
- Africa > Middle East > Algeria > Béjaïa Province > Béjaïa (0.06)
- North America > United States > New York > Niagara County > Niagara Falls (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
ARCOQ: Arabic Closest Opposite Questions Dataset
Rizkallah, Sandra, Atiya, Amir F., Shaheen, Samir
This paper presents a dataset for closest opposite questions in Arabic language. The dataset is the first of its kind for the Arabic language. It is beneficial for the assessment of systems on the aspect of antonymy detection. The structure is similar to that of the Graduate Record Examination (GRE) closest opposite questions dataset for the English language. The introduced dataset consists of 500 questions, each contains a query word for which the closest opposite needs to be determined from among a set of candidate words. Each question is also associated with the correct answer. We publish the dataset publicly in addition to providing standard splits of the dataset into development and test sets. Moreover, the paper provides a benchmark for the performance of different Arabic word embedding models on the introduced dataset.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Vietnam > Thái Nguyên Province > Thái Nguyên (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
ASMDD: Arabic Speech Mispronunciation Detection Dataset
Aly, Salah A., Salah, Abdelrahman, Eraqi, Hesham M.
The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert listeners.
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings
Abdul-Mageed, Muhammad, Elbassuoni, Shady, Doughman, Jad, Elmadany, AbdelRahim, Nagoudi, El Moatez Billah, Zoughby, Yorgo, Shaher, Ahmad, Gaba, Iskander, Helal, Ahmed, El-Razzaz, Mohammed
Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.
- Europe > Italy > Tuscany > Florence (0.04)
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (8 more...)
Using Artificial Intelligence to Read Arabic Comics - Al-Fanar Media
Arabic comics have in recent years grown into a thriving creative movement. BEIRUT--A computer scientist at the American University of Beirut is using artificial intelligence to classify the content of Arabic comics, applying the computer-based science to this cutting-edge art form in the Arab world. Artificial-intelligence specialists are always trying to stretch the capabilities of computer brainpower. If artificial intelligence can be used to play the ancient Chinese board game Go, or the American TV quiz game Jeopardy, then Arabic comics are also fair game. "I try to look for unusual applications for artificial intelligence and machine learning," explained Mariette Awad, the associate professor in the department of electrical and computer engineering at the American University of Beirut who is leading the project.
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.68)
- Asia > Middle East > Syria (0.05)
- Asia > Middle East > Iraq (0.05)
- Africa > Middle East > Egypt (0.05)
Building peace through video games in South Sudan
Lual Mayen, a 24-year-old software engineer, is determined to do what he can to bring change to South Sudan, a country ripped apart by civil war. Through the use of board and video games, he wants to promote unity and spread his message of peace throughout the world. "After the conflicts that started in 2013, I saw the horrible effects mass displacement could have with my own eyes. I witnessed it in IDP and refugee camps, but also online," Mayen told Al Jazeera. "These social clubs, both online and offline, were turned into sites for social evils and I could see the conflict brewing among various tribes that were crammed together. I knew that these scenarios could turn political and even physical, with people wanting revenge for what was happening to them."
- Africa > South Sudan (0.73)
- North America > United States > California > San Francisco County > San Francisco (0.07)
- Asia > Middle East > Syria (0.05)
- (2 more...)